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An improved estimator for the amplitude /nl of local-type non-Gaussianity from the cosmic 
microwave background (CMB) bispectrum is discussed. The standard estimator is constructed to 
be optimal in the zero-signal (i.e., Gaussian) limit. When applied to CMB maps which have a 
detectable level of non-Gaussianity the standard estimator is no longer optimal, possibly limiting 
the sensitivity of future observations to a non-Gaussian signal. Previous studies have proposed an 
improved estimator by using a realization-dependent normalization. Under the approximations of 
a flat sky and a vanishingly thin last-scattering surface, these studies showed that the variance of 
this improved estimator can be significantly smaller than the variance of the standard estimator 
when applied to non-Gaussian CMB maps. Here this technique is generalized to the full sky and 
to include the full radiation transfer function, yielding expressions for the improved estimator that 
can be directly applied to CMB maps. The ability of this estimator to reduce the variance as 
compared to the standard estimator in the face of a significant non-Gaussian signal is re-assessed 
using the full CMB transfer function. As a result of the late time integrated Sachs- Wolfe effect, the 
performance of the improved estimator is degraded. If CMB maps are first cleaned of the late-time 
ISW effect using a tracer of foreground structure, such as a galaxy survey or a measurement of CMB 
weak lensing, the new estimator does remove a majority of the excess variance, allowing a higher 
significance detection of /nl- 

PACS numbers: 98.70.Vc,98.80.Cq 

I. INTRODUCTION 

Over the past two decades our understanding of the physics of the early universe has gone from speculative to 
precise. We have numerous data sets which probe both the overall expansion of the universe as well as the statistics 
of the large-scale structure we see today. In addition to these observations, our standard cosmological model, with 
only six free parameters, provides a good fit to all of these data sets (see, e.g. [1]). The standard cosmological model 
relies on some basic assumptions about the origin and evolution of the universe: namely that soon after the big bang, 
the universe underwent a period of cosmic inflation during which nearly Gaussian perturbations were produced in an 
otherwise isotropic and homogeneous universe. After this period the universe was 'reheated'- i.e., populated with a 
thermal plasma of standard model particles. Particle physics dictates the interactions between the constituents of 
this primordial fluid, while general relativity dictates how the perturbations grow to form the structures we observe 
in both the clustering of galaxies, as well as in the anisotropies of the cosmic microwave background (CMB). 

With increasingly precise observations, we may test the foundations of the standard cosmological model. Here we 
will be concerned with testing the assumption that the statistics of CMB fluctuations are Gaussian. Although small 
levels of non-Gaussianity may develop through non-linearities in the standard cosmological model [2HS]i significant 
departures from Gaussianity can only be explained by changes to the fundamental physics of the early universe. 
For example, a detection of primordial non-Gaussianity could yield information on the interactions of the field (or 
fields) that seed the primordial curvature perturbation. If this field is the one that drives inflation (the inflaton), the 
measurement could yield insight into the detailed physics of inflation. If the curvature perturbation is seeded by a 
field that is sub-dominant during inflation (as in the curvaton scenario [THlOj). the primordial fluctuations may also 
be significantly non-Gaussian. Some more exotic possibilities, such as a non-canonical kinetic term for the inflaton 
[TTl - fl3] . spatially modulated reheating [14] ; and novel initial vacuum states for the fluctuations [15] . could also be 
probed by a detection or limit to non-Gaussianity in the CMB. 

There are an infinite number of ways in which the statistics of the CMB may be non-Gaussian, although the effective 
field theory approach shows that inflationary theories can only produce three forms for the non-Gaussian primordial 
three-point correlation function [11]. In this study we restrict our attention to the 'local model' of non-Gaussianity, in 
which the primordial curvature perturbation, can be written in terms of an auxiliary Gaussian field <j) as pi I16I - H5] 

<&(£) = m + /nl [4> 2 {x) - (^{x))] , (i) 

where the amplitude /nl parameterizes the level of non-Gaussianity. This model is particularly important because 
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if /nl 7^ were detected then all single- field slow-roll inflation models would be ruled out [2UJ [3T] . In addition, the 
local-type non-Gaussianity is predicted by the curvaton model [8l410j. and so a detection could provide support for, 
or constraints to, this model. 

Although there are several ways to estimate the level of non-Gaussianity in the CMB, an estimate of the harmonic 
three-point function (known as the CMB bispectrum) is the most sensitive [TH [25]. Given the large number of 
modes in the bispectrum (after restrictions due to statistical isotropy there are ^ ax terms in the bispectrum, where 
^max(— v/ A/pi X ) is the maximum multipole measured by a given experiment) a full exploration of the likelihood surface 
is computationally prohibitive, and so attempts to constrain the level of non-Gaussianity are made through estimators 
which have been constructed to minimize variance under the null hypothesis- i.e., when applied to CMB maps which 
are purely Gaussian (23] [24] ■ 

A direct application of the minimum- variance null-hypothesis (MVNH) estimator of /nl from the CMB bispectrum 
is also computationally expensive since the computation involves a very large number of terms (~ ^ ax ). The Planck 
satellite [35] will measure Z max ~ 2500 multipole moments so a blind application of this estimator will take ~ 10 15 
operations to compute! The real computational expense is even higher, as thousands of simulations must be run 
to characterize the statistics of this estimator. For a special set of non-Gaussian models (of which the local model 
is one) fast Fourier transforms (FFTs) may be used to greatly reduce this computational expense [551 127] ■ Using 
FFTs the estimator may then be evaluated with a computation time that scales as ^ ax , considerably reducing the 
computational expense (26j [28] . CMB data are currently consistent with vanishing non-Gaussianity [29, 30J; analysis 
of the WMAP 7-year data yields the 2-a constraint -10 < /nl < 74 Q]. 

If future data remain consistent with the null hypothesis, then the standard MVNH estimator should have the 
minimum variance. As the data improves, if /nl is found to be significantly different from zero, then the standard 
MVNH estimator is non-optimal and there may be other estimators which have a smaller variance (and hence increased 
signal-to- noise, S/N). This is because the MVNH estimator was constructed to have the minimum variance when 
/nl = and when applied to data with /nl the variance is no longer minimized. In particular, when applied 
to the local model, for /nl the MVNH estimator exhibits a variance which is proportional to / NL , leading to a 
saturation of the S/N of the estimator, even for large values of i max [3TM33| . This indicates that a new method to 
estimate /nl may extract a higher S/N from the measurements. 

In one approach, estimators are abandoned, and the full likelihood surface is explored using Bayesian methods |34| . 
Although this approach has been shown to give improved constraints on /nl, it is computationally expensive, taking 
over 150,000 CPU hours to compute a best fit-value and confidence interval for /nl with Z max = 512. In another 
approach, an improved estimator is constructed by defining a new realization- normalized estimator (RNE) [311 133] . 
This method normalizes the MVNH estimator using a particular combination of observed multipoles which are chosen 
so as to divide out the extra variance present. In Ref. [31] the RNE was derived in the flat sky and Sachs- Wolfe limit 
(in which the temperature anisotropics are given by fluctuations in the gravitational potential at the surface of last 
scattering, AT/T = — $/3 [35 ). In these limits, it was shown in Ref. [3T] that the RNE successfully removes all the 
variance proportional to / NL of the MVNH estimator when /nl 0. 

In this work we generalize the RNE to include the full radiation transfer function and provide expressions that are 
valid on all-sky CMB maps. We compute the variance of the RNE and find that, unlike in the Sachs- Wolfe limit, only 
~ 50% of the /NL-dependent variance is removed. 

The residual C(/nl) variance is a result of the late-time, large-scale integrated Sachs- Wolfe (ISW) effect, in which 
CMB anisotropies are generated by gravitational potential decay along the line-of-sight [3T>H42| . To understand why 
the late-time ISW effect reduces our ability to remove this extra variance, we note that the form of the local-model 
non-Gaussianities implies that the bispectrum signal is dominated by harmonic-space triangles which correlate one 
large-scale mode with two small scale modes (i.e., 'squeezed' triangles). If the large-scale anisotropies are generated 
only by the Sachs- Wolfe effect then they can be inverted to give a direct measurement of the value of the large-scale 
gravitational potential at the surface of last scattering. These measurements, in turn, allow us to directly infer the 
non-Gaussian contribution to the large-scale anisotropies, leading to a complete removal of all of the excess variance 
when /nl ^ 0. However, in the presence of both the late-time ISW and Sachs- Wolfe effect, this inversion is not 
possible, leading to a degradation of the performance of the RNE. We find, however, that if a tracer of foreground 
structure (such as a wide-field galaxy survey or the deflection field responsible for weak lensing of the CMB) is first 
applied to 'clean' a CMB map of the late-time ISW effect then ~ 80 — 90% of the /NL-dependent variance may be 
removed by using the RNE. 

We begin by summarizing the non-Gaussian local model and its bispectrum in Sec. [TT] We provide an intuitive, 
geometric argument for the origin of the /NL-dependent variance that afflicts the local-model MVNH estimator in 



Sec. Ill The realization-dependent normalization (RNE) is derived in Sec. IV using a method that applies in the 



presence of the full radiation transfer function. In Sec. [V] we compute the variance of the RNE using the full radiation 
transfer function, finding that the /NL-dependent variance of the MVNH estimator is only partially removed. In 



Sec. VI, we show that the /NL-dependent variance is further reduced by first cleaning the CMB of the late-time 
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ISW contribution using a large-scale structure survey [such as the NRAO VLA Sky Survey (NVSS), or the next- 
generation Joint Dark Energy Mission (WFIRST) proposal] or a measurement of CMB lensing. Conventions for 
flat-sky approximations employed in numerical calculations are stated in Appendix [A] Detailed expressions needed 
to obtain the variance are derived in Appendix [B] In Appendix C we present a computationally efficient algorithm 
(using fast Fourier transforms) to compute our estimator on full-sky CMB maps. 



II. THE NON-GAUSSIAN LOCAL MODEL 



First we will review the basic equations that relate to the definition and estimation of the CMB bispectrum, 
restricting attention to the local model as defined in Eq. ([fj. The CMB bispectrum is defined by 

nraitJisma _ / \ _ fimiro2trijt /n\ 

where the ai m are the usual multipole moments of the temperature map, bi^i 3 is the reduced CMB bispectrum, and 
the Gaunt integral is given by 



G m im2m3 = f d 2 hY my {fl)Y m _ (gji + 1) i 2l 2 + 1) (2^3 + 1) / h h h\( h h h | ,,,, 
( h h h \ 

and is a Wigner 3 J-coefficient. A product of three multipoles form an unbiased estimator for the 

V mi ni2 TO3 J 

angle-averaged CMB bispectrum: 

B hhh = H ( m \ m 2 2 m 3 ) a h mi ai 2m2 ai 3 m 3 (4) 



The minimum-variance null-hypothesis (MVNH) estimator for /nl is given by 

Dobs D 

D l 1 l 2 l 3 - D hhh 

SI S~1 S~1 ' 



/nl-^o 2^ n, n, r, ' (5) 



h<l2<h 

where 

(at m a*, m ,) = Ci5u>5 mm >, (6) 

and Bi x i 2 i 3 is the primordial (i.e., theoretical) angle-averaged bispectrum. The normalization of this estimator is given 
by the variance under the null hypothesis, er§ [33"] : 

al = V B ' lhh (7) 

— 4 A/. C/. C/, C/„ 

li<h<h 3 3 

where A^i^ = 1 if l\ 7^ I2 ^ I3, 6 if l\ = 1% = I3, and 2 otherwise. Finally, the angle-averaged primordial bispectrum, 
B; x ; 2 i 3 , is given in terms of the reduced bispectrum 



/ (2Z 1 + l)(2/ 2 + l)(2Z 3 + l) (h l 2 k\, fsl , 
B hhh-y ^ ^ JKhh- (8) 

Now restricting attention to the local-model bispectrum, at any radial location along the line of sight the primordial 
curvature potential, <&, can be decomposed into spherical harmonics. The local model ansatz in Eq. ([I]) then implies 
that 

<Wr) = Iimi (r) + /nl(-1)" 11 E E SrZT^^mAr^ismsir). (9) 

l 2 l 3 m 2 m 3 

This allows us to write the reduced bispectrum in a line-of-sight integral: 

b h i 2 i 3 = 2 ^ r 2 dra ;i (r)^ /2 (r)/3; 3 (r) + cyclic permutations. J , (10) 
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where the two filter-functions are given in terms of the transfer functions 



ai {r) = 2 I l^dkji(kr)Si(k), (ID 



7T 

A(r) = ^ J k 2 dkji(kr)Si(k)P(k), (12) 

and Si(k) is the CMB temperature anisotropy transfer function |44) . The generalization to polarization is straightfor- 
ward |43j ; in this work we limit our attention to the signature of primordial non-Gaussianity in the CMB temperature. 

Finally, we will need expressions for various auto and cross-correlations in terms of the distance along the line-of- 
sight r. The power-spectrum for the auxiliary Gaussian field cj> is 

(4>{k)4>*(%)) = (2ir) 3 6 ( V(k-k')P(k), (13) 

where 

d 3 k . ,-\ j£.g 



(2tt) 3 

These expressions lead to the line-of-sight autocorrelation [45 



0(&)e (14) 



(<f>im(r)<Pvm'(r')) = \,i'S m , m > J ^dkPikMkrlnikr') 

= x(r,r')6 u <d rnim >- (15) 
Similarly, the temperature multipole moments can be written as [261 146) 

aim = J r 2 drai(r)$i m (r). (16) 

With this we can write the line-of-sight cross-correlation 

{ai m <t>*, m ,{r)) = ~<5«'<!> mm ' J k 2 dkP(k)ji(kr)Si(k) 

= 5 u ,6 mm >Pi{r). (17) 
These expressions will be useful in the following discussion. 



III. THE ORIGIN OF THE INCREASED VARIANCE 

The standard non-Gaussian estimator written in Eq. ([5| is only optimal under the null hypothesis: /nl = 0. 
In the case where /nl the estimator becomes suboptimal (in the sense that it no longer saturates minimum 
variance attainable according to the Cramer- Rao bound, see Refs. [231 135) and has a variance which depends on / NL . 
Specifically, in the flat-sky Sachs- Wolfe limit the variance scales as [33 1331 HZ] 



where A is the amplitude of the power-spectrum of the primordial curvature perturbations, P(k) = Ak~ 3 . From this 
expression we can see that for /nl that in the large / max limit the variance flattens out and scales as ln _3 (^ max ). 

To understand the origin of this /NL-dependent variance let us first consider a simple toy model. Let di be a random 
variable with (a;) = and {aiCLj) = <J 2 5ij. As we will discuss further, the /NL-dependent variance in the /nl estimator 
comes from terms which look like 

•V X"'"- ( 19 ) 

ij 



— °o + ./nl u i> 

1 

~ 72A/ sky ^ ax ln(Z max ) ' JN "21n 3 (Z max ) 



^WUl ^ + / NL^37^ O ( 18 ) 
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The mean and variance of X are 

(x) = J2M = n<t 2 , (20) 

y 

(Ai) 2 ) = J2 («i%' a fe«/) - ^ 4 %^; = 2iW (21) 

where TV is the number of data points. From this, it is clear that the S/N of X is given by 

g if) ' (22) 

We can see that the S/N is a constant. The reason is simple: since the a, are uncorrelated, the off diagonal terms in 
X with i ^ j do not contribute to the signal; on the other hand, they do contribute to the noise, leading to a constant 
S/N 1 . We will see that something similar happens for the MVNH estimator. 

To understand the origin of the increased variance we expand the MVNH estimator in Eq. ([5| in powers of /nl 

Bb + /nlBi, (23) 

r _ „-1 a hm 1 a hm 2 a hm 3 p ( h h h \ , 0A \ 



with 



h<h<h m 1 m 2 m 3 1 1 J x 

I l! mi tt ! 2 m2 8 tni3 o ( W h h 

, , , C ;i C /2 C ;3 v mi m 2 m 3 



ft = „_ a E E %%^U B , M3 ( - - ;a ) +cyclicpermutations , (25) 



where 



NL 



r 2 drai(r)4>i m (r), (26) 
(-l) m E ]T ^r 1 ™ 2 / ^^(r)^ imi (r)0 i2m2 (r). (27) 



The total observed multipole moment is 

aim — a\ m + fm.a lm . (28) 

We now shed light on the origin of the increased variance by considering the flat-sky and Sachs- Wolfe limit of 
Eq. (25). In this limit the radiation transfer function is given by Si(k) — —ji(kr*)/3 where is the conformal 
distance to the surface of last scattering. We refer the reader to Appendix A for expressions which relate the full-sky 
to fiat-sky expressions. In these limits we have [5TI 155] : 

c > = 9^TT)< (29) 



where the primordial power-spectrum is taken to be scale invariant, P(k) = A/k 3 . Rewriting B\ in this way we can 
see that it is a 'product of two triangles with a shared side l\. We show a graphical representation of this in the top 
section of Fig. [T] We can rewrite B\ schematically as 



Bi=^W(i)AiAj, (31) 



h3 



Of course, if we were interested in estimating the variance we would choose a different weighting than in Eq. q 1 9p and use the estimator 
y"V a? which has the same signal but a reduced variance compared to Eq. |l9j. As we will discuss further, the increased variance in the 
/nl estimator cannot be reduced by choosing a new weighting. 



G 



with i labeling the triangle i = I2, I3} and j labeling the triangle j = {l±, rh, n}. 




FIG. 1: A graphical representation of the temperature configurations that are included in the Bi part of the bispectrum 
estimator, Eq. (25 1. The full shape is composed of two distinct triangles. Only one triangle contributes to the mean, but all 



triangles contribute to the variance, leading to a S/N which decreases slowly with l n 



The mean is given by a sum over a single triangle, as shown in the right panel of Fig. [TJ 



(32) 



Therefore, we can see that the mean is composed of a sum of (I^ ax ) 2 = Z^ ax terms. On the other hand, Fig. [T] shows 
that the variance of B\ is computed from a sum of a product of two triangles which gives (l^) 4 = /^ ax . As in the 
simple example given above, this shows that there are a large number of terms which do not contribute to the mean 
of £>! but do contribute to the variance, leading to a signal-to-noise (S/N) which does not decrease as fast as one 
might have expected with increasing Z max . 

It might seem that this can be corrected by choosing a different weight function W(i) = j* z - . Since this weight 
was chosen to optimize the estimator under the null (i.e., /nl = 0) hypothesis it is natural to think that a different 
weighting will be needed when /nl 7^ 0. Unfortunately, since the terms in B\ which contribute to the variance but 
not the mean involve contractions between (Zi ,^2) and (n,m), it is clear that no choice of weight, which depends only 
on the 'observed' indices I2, 13), will remove these terms. Therefore, a new weighting cannot decrease the variance 
of this estimator in the high S/N regime. Another way to remove the terms which do not contribute to the signal 
but do contribute to the variance is to use a realization dependent normalization. 

Before we discuss how to construct a realization dependent normalization that reduces the variance in B\ let us 
first demonstrate in what sense the squeezed limit dominates the variance in B\. In the flat-sky Sachs- Wolfe limits, 
it has been shown that the variance is well approximated by [33] 



((A8i) 2 )«| 



E 

h+l 2 +h=Q 



E^ 

si+s 2 =h 



J s 1 s 2 s 3 r 

c r 3 +s 3 ,o- 



(33) 



where {/} indicates the sum is over l\ + 1 2 + I3 — 0. We can calculate this by first writing it in a less compact form: 
In these limits the bispectrum is given by [23] 



fr/iMa — — 6{C/ 2 Ci 3 + Ci 3 Ci 1 + CjdCfc}. 
The second sum in Eq. ( 33 ) is given by 



Sl+S 2 =h 



1 



1 



s 2 (s 2 + l)Zs(2s + 1) ^(^3 + 1) s 2 (s 2 + l) 



(34) 



(35) 



and the summand is dominated by triangles where one index is much less than the other two (i.e., the squeezed limit) 
so that 



^ Cr 



si+s 2 =l 3 



(36) 
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We then have that 

((ASx) 2 ) oc att^A 2 V ' ^ ' C!7, 



li + 1% + £3 



We can see from this expression that it will be dominated by the terms where I3 is smallest. Therefore, taking the 
squeezed limit again we write I3 <C 1% — h and find that 

{(AB 1 ) 2 ) oc a 4 A 2 ^ cx ln(; max )- 2 , (38) 

min 



X 



where we have taken the limit £ max 3> l m in and used the expression for a Q found in Eq. (18). 

The numerical calculation found in Ref. [33J shows that the variance decreases slightly faster with ((AS1) 2 ) 
ln(Z max )~ 3 - non-squeezed configurations do make some contribution to the estimator which causes the variance fall 
off more rapidly with Z max . For other forms of non-Gaussianity, the C(/nl) term in the expansion of the estimator 



can not be written simply as in Eq. (25 1. An expression analogous to Eq. (301, however, may still be written though 
with a different weighting due to the shapes of the Fourier-space triangles which dominate the S/N in these cases. 
Since these models are not dominated by the squeezed limit, they are not afflicted by the slow scaling with / max of 
the S/N that occurs for the local model. 

IV. REALIZATION NORMALIZED ESTIMATOR 

In order to improve the /nl estimator when /nl 7^ 0, we will search for a realization dependent normalization which 
removes some of the variance in those terms in the estimator which do not contribute to the signal but do contribute 
to the noise. To do this we follow the approach presented in Ref. [31] and construct a new realization-dependent 
normalization which will remove as much of the /NL-dependent variance of B\ as possible. In other words, we wish 
to define a realization-dependent normalization that is highly correlated with B±. 

We write the new realization-dependent normalization as an estimator for B\ , B\ , so that we obtain a new estimator 
for /nl, the realization-normalized estimator (RNE): 

Bi 61 Si 

The extent to which the new realization-dependent normalization decreases the variance of this estimator is determined 
by the correlation between B\ and B\\ in the limit that these two terms are fully correlated the variance is 
completely removed. 



Looking at the equation for B\ [Eq. (25 )] the only term that cannot be immediately written in terms of observables 
is af^. Therefore, when constructing the estimator B\ we must find a minimum variance estimator for af^. To do 
this we define a weighted sum, 

-^NL \ s \ s Tx/mmimo / Ar\\ 

a lm=Z^ 1^ W lhh ahni^lzmi, (40) 
1^2 mitrij 

and demand that the weight minimizes the variance, 

' )=0. (41) 



^mmim 2 * 
U1I2 



-NL _ NL 
Hm a lm 



Choosing the weight 



W^Z 1 " 12 = "^V" / r 2 dr ai (r)(3 h (r)P h (r), (42) 



satisfies Eq. (41) and thus leads to a minimum variance estimator for a^. With this, the estimator for B\ can be 
written 

£ = *o~ 2 E E ^^r^ B lh i 2 ( 1 h h )+ c y clic P™ tations ' ( 43 ) 

u ^ ^ CiC h Ci 2 12 \mm 1 m 2 J J ^ y ' 



h<h<h mim 2 m 3 

_ -2 0-hm i ai 2 m2 a l a rn a O-l b m b „ „ I I h fo \ I I la h \ / aa\ 

- a o 2^ 2^ 2dC h Cud C h ° u ^ au ^ U mi m 2 )\m m a m b J ' ^> 



where we have made the approximation a\ m ~ a; m , which is true to lowest order in /nl- 

We note that there are several other ways of arriving at the same estimator for B\. In particular, if we were to 
instead search for a minimum variance estimator for $; m as in Ref. [33], then we would again be lead to the estimator 



for B\ in Eq. (44). Furthermore, the same relation between the underlying gravitational potential and the observed 
multipoles appears when calculating the shape of the likelihood surface for the MVNH estimator as discussed in 
Ref. El- 



As we now discuss, the improved estimator given by Eqs. (39 1 and (44) (the RNE), has two important properties. 
First we show that in the case where the transfer function is just the Sachs- Wolfe transfer function, the RNE is the 
same as the estimator presented in Ref. [31] . Our discussion then shows how the RNE, in this limit, is able to remove 
all of the /NL-dependent variance from the standard MVNH estimator. Second, when the full transfer-function is 
used, the combination of both the Sachs- Wolfe and late-time integrated Sachs- Wolfe (ISW) effects on large angular 
scales causes the RNE to be less effective; it only removes part of the /NL-dependent variance. Using other tracers 
of large-scale structure, as discussed in Ref. [3H|j allows us to improve the performance of the RNE, so that it can 
remove most of the /NL-dependent variance. Finally, in Appendix C we discuss how our estimator can be rewritten 
in terms of real-space quantities, so as to be computationally efficient. 

V. PROPERTIES OF THE REALIZATION-NORMALIZED ESTIMATOR 

To simplify the calculations in this section we work in the flat-sky approximation. Although the results presented 
will differ when compared to full-sky calculations, since we are interested in calculating the fractional reduction of 
the /NL-dependent variance (relative to the /NL-dependent variance of the standard MVNH estimator), we expect 
the differences to be small. 

To quantify the statistical properties of the RNE we must calculate the mean and variance of the ratios Bo/ (£>i) 

and B\j (B\). To do this we will use an approximate formula for the variance of the ratio of two stochastic variables. 
Let X\ and x% be two stochastic variables with means [i\ and p,2, variances o~\ and erf, and covariance p. We wish to 
calculate the mean and variance of 

W=— . (45) 

The probability density function (PDF) of W may be computed analytically if X\ and X2 are normally distributed 
[4T)] . However, the formula for the PDF is quite complicated and it does not immediately yield analytic formulae for 
the mean, (W), and variance ((W— (W)) 2 ) = (AW 2 ). Instead, we will derive an approximate expression. To do 
this we write X\ = fix + 8x\ and X2 = ^2 + 8x 2 - We then assume 8x1^/1^1,2 = e <C 1; this will be true near the peak 
of the normal distribution if fj,i^2 3> 01,2 (we have verified that the stochastic quantities discussed we are interested 
in satisfy this condition). We can write 

W « (1 + fai//ii - 8x2/1*2) + 0{e 2 ). (46) 



With this we can easily compute 



/'2 



(W) = £i (47) 
^2 

< AW/2 > = i + a i- 2 ^r- (48) 

Mi M2 M1M2 
A. The mean 

We wish to show that (Bi) = (^Bi^. To leading order in /nl in the flat-sky approximation, 

(61) = 4 E S %&Xi B m£cl I r " dra ^ {4 a \M^ k ,{r)) , (49) 
fi+f 2 +f 3 =o 1 2 

rA 2 c BjhMM) f 2, / > WMj / L L L L \ ( K a\ 

Bl ) = E frffr,n anacic J r h{r) c k c k , \ a zW a «) ■ ( 50 ) 
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The fact that |Zi| > 2 requires that the Wick contraction which contains { ( t ) k{ r ) < t ) k'{ r )) does not contribute to the 
mean. Therefore we are left with 

a r 3 a M r )H>( r )) = ( a r 2 h( r )) ( a Lh>( r )) + ( a Lh( r )) ( a r 2 <M r )) . ( 51 ) 

= Mr)h{r) [S r3 ,sh,-i> + k.-S'h.-i] ■ ( 52 ) 
Now to calculate the equivalent expression for B\. Again, as with B\ only the 'cross' terms survive and we have 



a T 2 a L a kH- ) = &U Whir) 5r-kk,~k> + S i 2 ,-k> 5 L,-k 



so we have that (B\) = \B\). With this we can conclude that the RNE is unbiased. 



(53) 



B. The variance 



The variance of the first term can be approximated by 



A 



go) 
(Bi), 



((So) 2 ) + {(Bo) 2 A(B 1 )A(B 1 ) 

((Bb)(B )> (i + (a(bT)A(bT))) 
a 2 (l + (A(BT)A(Sr))). 



The variance of the second term can be written as 

«•!)'; 



<A(Bi)A(B!)) + (A(Bi)A(iBi)) - 2 (A^A^)) . 



(54) 

(55) 
(56) 



(57) 



A tedious yet straight-forward calculation shows that the variance of the RNE is composed of four terms (we show 
the details of the calculation in Appendix A). Of those four terms one dominates in the Z max » 1 limit so that we 
have 



i-T. Wi G * 2 C fci G fc 2 C fc3 3 ' 3 



(58) 



{i},{k} 



A(Bi)A(Bi) 



J r 2 dr(r') 2 dr'a h (r)a kl (r')/3fc 2 (r')/3/ 2 (r) X i 3 (r, r') 

a 4 S(Zi,Z 2 ,i3)-B(fcl,fc 2 ,A;3) 

ocr 2^ ~ ~ ~ ~ ~ d r_ n. 



{i},{k} 



I r 2 dr(r') 2 dr'a h (r)a kl (r')/3fc 2 (r')A 2 (r) 



A(B 1 )A(B 1 )) = (A^OA^r)). 
The variance is given by 



(59) 



(60) 



B{l 1 ,l 2 ,h)B{k 1 ,k 2 ,k 3 ) 



5r r 



{;},{fc} 



2C tl C7, 2 C fcl C fc2 C fc3 ^ fe 3 



x y r 2 dr(r') 2 dr' a(l (r) afel (r')/3/ 2 (r)/3 fe2 (r')I? ; 3(^^), 



2?,(r, r') 



#(r)A(r') 

Xi(r,r) - 



(61) 
(62) 
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We are interested in calculating the fractional reduction of the /NL-dependent variance in the RNE, (/nl) N , relative 
to the /NL-dependent variance in the standard MVNH estimator, /nl- To quantify this we will define 



K 




(ABl). 



(63) 



If 1Z = then all of the /NL-dependent variance has been removed; for < 7Z < 1 then there is a residual /nl- 
dcpcndcnt variance which the RNE does not remove. 
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FIG. 2: The ratio of the fractional reduction of the /NL-dependent variance in (/nl) N as a function of the value of the I3 
multipole (which is the large scale in the squeezed limit). This figure was produced with Z max = 100 (we have checked that 
other choices for Z max reproduce curve). From this figure we can see that ~ 95% of the fractional reduction, 1Z, is contained in 
the quadrupole I3 = 2 justifying our approximation in Eq. (651. 



Looking at Eq. (61 1 we can see that the extent to which we reduce the variance of the RNE depends on the function 

l^l), we have 



Di(r,r'). In the squeezed limit (|Z 3 | ~ l m - m <§C \li\ 



oc ^X>/ 3 (r»,r») 




l 2 \ and |fc 3 | ~ l min < \ki\ 

B(? 1 ,/ 2 ,? 3 )S(fc 1 ,fc 2 ,Z 3 ; 



E 

{f},{fe} 



0/ 1 0; 2 C// 3 C/fe 1 0fe 2 



(64) 



where we have made the substitution r 2 a;(r) oc S(r — 7%) for I 3> l m i n ~ he., to a very good approximation, all small- 
scale power is sourced by physics that occurs at the surface of last scattering. This allows us to write the fractional 
reduction of the /NL-dependent variance as 



T>2{r*,r*) 
X2(r*,r*) ' 



(65) 



where we have fixed I3 — Z m ; n = 2 since that value contains upwards of 95% of the variance (see Fig. [2]). We have 
verified this approximation by calculating the full sum in Eq. (61) and found it to be accurate to a few percent. The 
function x;(r, r 1 ) is calculated using P{k) = Ak Tls ~' 1 , where we use n s = 1 to simplify the computation of the integral 
over k (the dependence on the actual value of n s is negligible). 



C. The /NL-dependent variance in the Sachs- Wolfe limit 

Before calculating the improvement in the /NL-dependent variance in the case of a ACDM universe, let us first 



analyze these expressions in the Sachs- Wolfe limit. In this limit a/(r) cx 5(r — r*). From Eq. (62) it is clear that the 
/NL-dependent variance will vanish (i.e., 1Z = 0) since it depends on 



2?,(r*,r0 = Xi(r*,r*) - Pf(r.)/G = 0. 



(66) 
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This result was found in Refs. [STJ ED 122] • 

We can understand this result in a slightly different way which will highlight how the Sachs- Wolfe limit is unique. 
In the Sachs- Wolfe limit the estimator af r ^ is fully correlated (up to order Af^ L ) with af^ since 

NL 
Im 



a 



3 E E (- i rGuT im2a ^ a ^ 2 , (67) 

l-il 2 mini] 



NL 



~ln E (- i ) m ^;r im2 ^ 1 ™ 1 ^)^™ 2 (^)+°( A /NL) (68) 

l t l 2 m 1 m 2 

E E (- 1 ) m ^7 1 T 2 mim2 / r 2 dr ai (r)<P hm Ar)^ 2m2 (r) (69) 
= 4E E (-ir^ , " iroa ^-i( r *)^ a (rO. (70) 

hl 2 ■m 1 m 2 

In a ACDM universe, however, the large-scale anisotropies receive power from both the Sachs- Wolfe as well as the 
late-time integrated Sachs- Wolfe (ISW) effects. This additional contribution to the large-scale power degrades the 
correlation between af£ and af^ thus leaving some residual /^-dependent variance. 



D. The /NL-dependent variance for the full ACDM transfer function 



We are now in a position to calculate the reduction of the /NL-dependent variance using the realization-dependent 
normalization in a ACDM universe. As we already discussed, the value of the ratio T>i (r* , ) /\i {t* , r* ) at the 
quadrupole (I = 2) gives a good estimate for the fractional reduction of the /NL-dependent variance, 1Z. 




r (Mpc) 

FIG. 3: The transfer function, r 2 «;(r), for the quadrupole (I — 2) as a function of conformal distance in Mpc. The left-hand 
panel shows the effects of late-time ISW. The dip around r « 10 4 Mpc corresponds to reionization at z = 11. The right-hand 
panel highlights the evolution of r 2 ai(r) around decoupling. Note that the scale is not the same in both panels. 



In Fig. [3] we show how the quadrupole hlter a%{r) [see Eq. ( 11 )] depends on the line-of-sight distance r in a ACDM 
universe. First note that there is a relatively large rise below r < 10 3 Mpc, corresponding to the late-time ISW effect. 
Furthermore at r ~ 2 x 10 4 Mpc, the filter shows a sharp dip corresponding to the contribution from the surface of 
last scattering. Therefore in a ACDM universe on large scales the observed multipoles a; m have contributions from 
two separate epochs: the ISW effect at late-times and the Sachs- Wolfe effect around the surface of last scattering. 
This two-epoch contribution is central to understanding how the RNE works when applied to a ADCM universe. 

To investigate how the late-time ISW effect impacts the improvement of the RNE, we modified the publicly available 
Boltzmann code CAMB [SD] to include a new parameter, AigW) which controls the amplitude of the late-time ISW 
effect in the line-of-sight integral When Aisw — lit takes on its standard value in the calculation of the C;s; when 
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FIG. 4: Left: The realization-dependent normalization reduces the /NL-dependent variance to varying extents depending on the 
level of the late-time ISW effect. From top to bottom we plot the function Di(r*,r,)/xi{r*,r,) with varying levels of late-time 
ISW: 1, 0.5, 0. The overall reduction of the /NL-dependent variance roughly corresponds to the value of T>2(r,, r*)/x2(r,, r*). 
Right: The fractional improvement in the /NL-dependent variance, 1Z, as a function of the amplitude of the late-time ISW. 
When the late-time ISW is completely removed (i.e., Aisw — 0) the /NL-dependent variance is nearly completely removed; for 
an unmodified ISW level the /NL-dependent variance is reduced by a factor of ~ 0.5. 



it is zero the late-time ISW effect is absent. In Fig. [4]we show T>i (r* , r* (r* , ?"* ) as a function of multipole, I, with 
varying late-time ISW amplitude, Aisw- On the right hand side of the figure, we can see 1Z ~ T>2 (r* , ) /\2 {r* , ?"* ) 
falls off sharply as Aisw decreases. Qualitatively this result can be understood by noting that with both the Sachs- 
Wolfe and late-time ISW effects contributing to the large-scale anisotropies our estimator becomes less correlated 
with the actual af^ leading to a larger value for the fractional reduction 1Z. 

In order to better understand how the late-time ISW effect limits our ability to remove the /NL-dependent variance 
quantitatively we can approximate the anisotropies on large-scales by |51j 

aim - ~^im(r*) + Aj SW $ lm (ri SW ), (71) 

where Aisw controls the contribution of the ISW effect to the temperature multipoles, and nsw is the conformal 
distance from the observer to the peak of the ISW visibility function. The ISW contribution extends from ~ 500 Mpc 



to 4000 Mpc (where the present time is Mpc) . Eq. ( 71 ) explicitly shows how the contribution of the late-time ISW 
effect to the observed multipole moments ai m limits our ability to reconstruct af^. Since the correlation between the 
Sachs- Wolfe and ISW terms is at most 10% we neglect it and we have 

P 2 (n 1 n) CT m 

7 X 2 (r*,n)- Cf w + ^ swC isw> M 

where Cf w and C ; ISW are the Sachs- Wolfe and the late-time ISW contribution to the power spectrum, respectively. 
As shown on the right-hand panel in Fig. |4j in a ACDM universe with all of the late-time ISW effect included, the 
/NL-dependent variance can only be reduced by a factor of 0.5 using the realization-dependent normalization defined 



in Eqs. ( |39| ) and (43). When the late-time ISW contribution is completely removed, the /NL-dependent variance is 
reduced by a factor of 20 (1Z = 0.05). This raises the question of whether or not probes of large-scale structure closer 
to the present epoch can be used to 'remove' the late-time ISW contribution to the CMB anisotropies and further 
reduce the /NL-dependent variance. Next we discuss how we may use such a tracer of the ISW effect to further reduce 
the /NL-dependent variance of the RNE. 



VI. ISW SUBTRACTION WITH FOREGROUND TRACERS 



We have identified the late-time ISW effect as the cause of the residual variance scaling as / NL . If the late-time 
ISW component of a temperature map could be estimated, and then cleaned out from the data, we expect that the 
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corresponding generalization of the RNE would then be nearly free of the /NL-dependent variance. 

Since the bulk of the late-time ISW effect in a ACDM cosmology comes from redshift z ~ 1, a measurement of the 
large-scale gravitational potential or density field around this redshift should yield information that we can use to 
clean our map of the ISW effect. The use of an ISW-cleaned map to improve the sensitivity of the CMB to primordial 
non-Gaussianity was first proposed in Ref. |48| . In that work, only the zeroth order variance ((AZ?o) 2 ) was computed, 
and as a result the improvement in the S/N obtained by using ISW-cleaned maps was marginal. In contrast, here we 
will see that using a large-scale structure tracer has the potential to reduce the /NL-dependent variance by up to ~ 
90%. 

For an arbitrary large-scale tracer ti m , the ISW-cleaned CMB temperature anisotropy is 



/ isw f * \ 

\ a lm l lm/ 
(^"i t* m ) 



(73) 



where a}™ is the portion of the total multipoles, a; m , due to the late-time ISW effect, t[ m is the multipole associated 
with the large-scale tracer field, a superscript c indicates a quantity that has been 'cleaned' of the late-time ISW 
effect, and a superscript t indicates a quantity associated with the tracer field. We note that if we use the lensing 
potential for CMB weak lensing as our tracer field then ti m is obtained from some higher-order cumulant of the data. 
This then implies that the tracer's power spectrum must also take into account an additional noise term, as we discuss 
in more detail below. 



The RNE takes the same form as in Eq. (39) but written in terms of ISW-cleaned quantities: 



Cf 



C / \ 



ai{r) 



£,T,t£,ISW,t 

) I I 



Cf 



#(r), 



(74) 

(75) 
(76) 



where a\{r) and P\(r) are denned as in Eqs. (11) and (12) but with in terms of a transfer function corresponding to 
the power spectrum of the tracer field. In order to evaluate the fractional reduction of the /„/-dependent variance 
with an ISW cleaned CMB map, we evaluate Eq. ( p3lj ) making the identification cti(r) — > of(r), Pi{r) — > Pf(r), and 



Ci — > Cf. Now that we have expressions for the fractional reduction of the /NL-dcpcndcnt variance of an estimator 

/nl built from ISW-cleaned maps, we consider two specific tracers of foreground structure: weak lensing of the CMB 
and galaxy surveys. 



A. Reconstructed lensing potential 



Weak lensing deflects the trajectories of CMB photons, remapping the temperature and polarization fields. The 
projected potential Vl'(n) determines the trajectories of lensed CMB photons. A reconstruction of the projected 
potential field from measurements of the CMB would allow a separate probe of large-scale structure up to redshifts 
of ~ a few. 

Referring to the solid curves in the left-hand panel of Fig. [5] we show the correlation coefficient between the lensing 
potential and the late-time ISW effect, as well as the correlation coefficient between the deflection and full temperature 
fields. We see that the multipole moments of the deflection field are very strongly correlated with the late-time ISW 
effect. The fractional reduction of variance obtained with the realization-dependent normalization for the perfectly 
(via lensing) cleaned maps is controlled by the ratio P*(r*, ?~*)/xz(r*, r*) is shown in the right-hand panel of Fig. u3 
We find that the realization-dependent normalization removes ~ 90% of the / NL variance if the map is 'cleaned' o : 
the late-time ISW effect using a perfect reconstruction of the CMB deflection potential. 

Of course, even with a idealized noiseless CMB map, the deflection potential can not be perfectly reconstructed. 
The lensing estimator relies on off-diagonal correlations of temperature multipoles, and in any given realization of the 
power spectrum, there will be some overlap between the lensing estimator and chance correlations that arise from 
cosmic variance. This leads to the reconstruction noise variance N [48 l 152 ) . 

On the large scales relevant for ISW subtraction, the bulk of the S/N for lensing construction comes from very 
high I, and so instrument noise can be neglected in our estimates. We make the replacement Cf — > Cf + N 
above to assess how much a realistic reconstruction of VP (and subsequent cleaning of the CMB temperature map) 
could improve the performance of the RNE. We assume a reference experiment with temperature and polarization 
noise AT = AP/y/2 = 1 A*K arcmin, and an angular resolution of a = 4', as discussed in detail in Ref. [S3]. We note 
that measurements using Planck will not be sensitive enough to realistically use the reconstructed lensing potential in 
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FIG. 5: Left : The three black curves show the correlation between the lensing potential and the late-time ISW effect in the CMB 
temperature: the solid curve shows the correlation without taking into account lensing reconstruction noise; the dot-dashed 
curve shows the correlation between the late-time ISW effect and the reconstructed lensing potential using EB correlations, 
including reconstruction noise; analogously the long-dashed curve shows the same correlation, but using TT correlations to 
reconstruct the lensing potential. The two red curves show the correlation between the forecasted Euclid/WFIRST (dotted) 
or NVSS (medium dashed) density fields, and the late-time ISW effect. Right: The fractional reduction in the quantity T>i for 
the uncleaned CMB maps (cyan, medium-short dashed) and cleaned maps with line-types corresponding to the curves in the 
left panel. 



order to remove the late-time ISW effect. From the right-hand panel of Fig. [5| we see that if the temperature field is 
used as to reconstruct the deflection field, the RNE can remove ~ 70% of the variance proportional to f^ L , whereas 
if the polarization field (through the EB correlation) is used, the RNE can remove ~ 80% of the excess variance 2 . 



B. Galaxy survey 



Alternatively, a galaxy survey with peak redshift near z ~ 1 also probes the potential field at the epochs when the 
late-time ISW effect is imprinted on the CMB. For a galaxy survey with selection function w(z), the transfer-function 
is given by 



Sf(k) 



drb k (r)S^(r)w[z(r)]ji(kr) 



(77) 



where bfc(r) is the bias, z(r) is the redshift as a function of conformal distance, and S™(r) is the time and scale- 
dependent function which maps the primordial potential to the evolved matter density. We have taken [41] [42] 



w(z) = CH(z) 



exp 



(78) 



where for the NVSS survey we take zq — 0.79, 13 — 1, and m = 1.18 as in Ref. [51j. On the other hand, for a futuristic 
space-based all-sky dedicated dark energy, such as that described in the Joint Dark Energy Mission (WFIRST) [55] 
and Euclid mission concept [56] . or the Large Synoptic Survey Telescope [55] (LSST) we take z — 0.5, f) = 1, m = 2 
[55j . The presence of the Hubble parameter converts the selection function from galaxies per unit redshift, to galaxies 
per unit conformal time. We do not need to specify the normalization constant C, as it divides out of all quantities 
of interest. 



2 Of course in a more complete analysis, we could include other correlation functions (EE, TE, EE, and BB) in the reconstruction of the 
deflection field. We choose to focus on TT (since TT-based detection of CMB weak lensing have already been made) and EB (because it 
vanishes in the null hypothesis of no lensing, and thus provides a strong probe of the deflection field). These other correlation functions 
could themselves be used to look for non-Gaussianity- indeed, Refs. |23) and [43] introduce the MVNH that includes all of them. In 
this work, we restrict ourself to the simple case discussed already and leave a more complete analysis for future work. 
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We approximate the bias as constant in scale as well as in redshift so ^ na ^ ^ factors out of the computation 

completely 3 . Referring to the dashed- lines in Fig. [HJ we show the correlation coefficient between large-scale structure 
(for both NVSS and Euclid/WFIRST parameters) and the late-time ISW effect, as well as the correlation coefficient 
between large-scale structure and the full temperature field. We see that the multipole moments of the large-scale 
density field are very strongly correlated with the late-time ISW effect. 

The fractional reduction in the variance obtained with the RNE for the cleaned maps is shown in the right-hand 
panel of Fig. [5j We find that the realization-dependent normalization removes ~ 75% of the / NL variance if the ISW 
effect is removed by cross-correlating a CMB map with a large-scale structure survey for the NVSS survey parameters, 
and ~ 90% of the variance if WFIRST survey parameters are assumed. In Fig. |6j we see how the S/N for /nl is 
improved if prior to the application of the realization-dependent normalization, the late-time ISW effect is removed 
from maps using two different secondary tracers. This figure was produced assuming the variance of the standard 



MVNH estimator given in Eq. (j 1 S|) and the reduction in the /NL-dependent variance, 1Z, using various large-scale 

(79) 



structure tracers to 'clean' the CMB maps of the late-time ISW effect 

S /nl 



'KL 



72A/ sky (2 iax ln(/ max ) ^ 21n 3 (Z ma , x ) 

Although the scaling with l max and /nl given in the above equation was calculated in the flat-sky Sachs- Wolfe limit 
[3Tl[33l|47], it has been shown to reproduce the Z max scaling calculated on the full sky and with the full transfer-function 
[251152] . However, we note that the exact numerical factors may be different for the full sky and full transfer-function 
case. It is thus possible that the /NL-dependent variance may be even more important at lower /nl than is indicated 
by this expression. As future data is used to determine the level of non-Gaussianity in the CMB, the statistics of the 
standard null-hypothesis estimator should be checked, especially if the data starts to indicate that / NL 7^ 0. 

We see from this figure that if primordial non-Gaussianity is of local type and /nl ^ 5, a significant improvement 
in the S/N for /nl may be obtained by using cleaned maps. Of course in a real galaxy survey, there are additional 
complications due to incomplete sky coverage, photometric redshift errors, and shot-noise due to a finite number 
of galaxies in the survey volume. Here we neglect these important real-world effects to highlight the fact that a 
measurement of the ISW effect can help reduce the /^ L variance in the RNE. An increase in the S/N in an estimate 
for /nl would not only lead to a more precise determination of the level of non-Gaussianity in the CMB, but would 
also lead to a more precise estimation of additional parameters (such as a possible scale-dependence) associated with 
primodial non-Gaussianity. 



VII. CONCLUSIONS 



We have investigated the origin of the /NL-dependent variance when applying the standard MVNH /nl estimator 
to CMB maps with appreciable non-Gaussianity. We found that this variance is due to terms that appear in the 
estimator which do not contribute to the signal but which do contribute to the noise. 

Previous work in Ref. 134) has shown that a Bayesian analysis has the potential to provide an estimate of /nl from 
the CMB which does not show an /NL-dependent increase in the variance when applied to maps with appreciable non- 
Gaussianity. That approach, however, is computationally expensive and quite inefficient, taking 150,000 CPU hours 
to compute the estimator on a simulated non-Gaussian CMB map. It is therefore desirable to find a computationally 
simple and efficient method to estimate /nl from the CMB which remains optimal even when applied to maps with 
appreciable non-Gaussianity. 

We have found that a new realization-dependent estimator (RNE) can be constructed which is computationally 
efficient (utilizing the scaling properties of fast-Fourier transforms) and reduces the /NL-dependent variance by a 
factor of ~ 2. Previous studies have shown this same realization-dependent normalization can completely remove 
the /NL-dependent variance in the Sachs- Wolfe limit. When the full transfer-function is used, however, this limit is 
a poor approximation to the CMB power-spectrum, especially at large-scales where the late-time ISW effect (due to 
late-time acceleration) contributes about half of the power. We have artificially reduced the level of the late-time ISW 
and found that when it is completely removed our new estimator has negligible /NL-dcpcndcnt variance even with the 
full transfer function. This implies that by using a tracer which effectively removes the late-time ISW contribution 
to the CMB map we can use the RNE to further reduce the /NL-dependence. 



In reality, local- type non-Gaussianity would also induce scale-dependent bias 1571 . and thus corrections to the variance of the cleaned 
map. This effect, however, would be higher order in /nl, so we may neglect it without loss of generality. 
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FIG. 6: The signal to noise (S/N) in an experiment with Z max = 2500 and / s k y = 0.8 (corresponding to Planck) as a function 
of /nl under the flat-sky and Sachs- Wolfe approximations. The top (solid) curve shows the S/N for the standard MVNH 
estimator without taking into account the /NL-dependent variance. The curve second from the top (dot-dashed) shows the 
S/N for the RNE after a tracer field has been used to remove most of the late-time ISW effect, leading to a reduction of the 
/NL-dependent variance by a factor of TZ = 0.1. The curve third from the top (dotted) shows the S/N for the RNE using 
only CMB data with TZ = 0.5. The bottom curve (long-dashed) shows the S/N for the standard MVNH estimator with the 
(full) /NL-dependent variance with TZ = 1. As this figure shows, the cleaned maps can increase the S/N by several standard 
deviations. 



We considered two tracers of large-scale structure: the deflection field (which generates lensing in the CMB) and 
a large-scale structure survey (considering survey parameters comparable to those of NVSS and Euclid/WFIRST) 
with a mean redshift of z ~ 1. Both tracers are highly correlated with the late-time ISW. We find that by using the 
deflection field as a tracer, we can reduce the /NL-dependent variance by a factor of ~ 0.1 using a futuristic CMB 
experiment (the reconstruction of the deflection potential from Planck is not accurate enough to be useful). If the 
large-scale structure measured by NVSS is used as a tracer, the variance could be reduced by a factor of ~ 0.25, while 
a next-generation mission like Euclid/WFIRST could reduce the variance by a factor of ~ 0.1. 

We show the improvement in the S/N for the estimation of /nl using the RNE assuming a satellite experiment 
like Planck (^ max = 2500, / s k y = 0.8) in Fig. [6] If the data indicate that /nl ^ 0, then the estimator discussed here 
can be used to increase the S/N of the detection, thus shedding new light on primordial non-Gaussianity. 
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Appendix A: Flat-sky approximation 

To speed computation throughout the paper (by allowing the use of FFTs along both dimensions of the sky), we 
use the flat-sky approximation as described in Refs. [331 158j . while providing whole-sky expressions for use on real 
maps. In terms of the fractional temperature perturbation T(fi) at position h, the temperature power spectrum C\ is 
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jiven by 

(%%) = ^r 1+ f 2 ,cA ( A1 ) 

d 2 6 e- lt3 T0) ~ — - J2 e~ llS T0), (A2) 



iV plx 



where = 47r/ s k y is the survey area (in steradians) , and <5f + f Q is a Kronecker delta that sets l\ = — 1% and we use 
the convention S 2 (l) — CISf to convert between Kronecker and Dirac-(5 functions. We convert between discrete sums 
and integrals using the correspondence S ; -o f2 / d 2 Z/ (27r) 2 [55]. We use the convention 

P(k) = Afc"^ 4 , (A3) 

and assume a scale-invariant power spectrum (n s — 1) for the duration of this paper. For a scale- invariant primordial 
power spectrum in the Sachs- Wolfe approximation, the angular power spectrum for T(9) is |23j : 

c < = m+Ty < A4 > 

where we take the amplitude A = 2tt 2 A| ~ 2.43 x 10~ 9 x 2tt 2 ~ 4.7 x 1(T 8 lj. In the flat-sky limit, the reduced 
bispectrum bi x i 2 i 3 is given by 

%%%) = Q6 h+h+hfl h hl*h- ( A5 ) 



The expression for 6i 1 / 2 ; 3 in Eq. (10) itself [in terms of a;(r), /3i(r)] is unchanged. The Kronecker delta insures that 
the bispectrum is defined only for l\ + l 2 + I3 = 0; i.e., only for triangles in Fourier space. Statistical isotropy then 
dictates that the bispectrum depends only on the magnitudes l\, I2, h of the three sides of this Fourier triangle. To 
derive various expressions in the text, we will use the flat-sky equivalent of the Wigner-3J coefficient [52"] : 



(«! + !)(«» + + D ({; b h )(h h k\ s (A6) 

4-7T 1 M mi m 2 m 3 J h+h+hfi v ' 

In the flat-sky limit, the MVNH is given (applying the same arguments used to derive it in the whole-sky case) by 

[231 EB] 

-r~ _ 2 v- a h a h a r 3 b hhi 3 

~ 2. 6WC h C l2 C l3 ' (A7) 
h+h+h=o 



and it has inverse variance, 



, S 6toc h c h c u ' (A8) 

h+h+h=0 



Appendix B: Details of the calculation of the variance 

We will first compute the variance of B\ . To do so we will concentrate on the part of the variance 
J r 2 dr{r'ydr' ail {r)a mi {r')(a r2 a r ^^ 

We will first concentrate on identifying the various types of terms that will arise from the expectation value. First 
note that the Kronecker deltas require that the only 'internal' contractions can be between the / or fh terms with the 
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ks or ps. Therefore the only contractions on the 'off-diagonal' [in which all a,-s are contracted with 0/'(r)] are: 



A 2 = («r 2 ^W)<^ 2 W>(a r3 ^(rO)<^(r)a^>, 

= S l 2 ,-^l2 ( r )fe 2 ,-f/ 9 m 2 ( r ') 5 l 3 ,p> Ph ( r ') 5 rh 3 ,k'P m 3 (r). 



(B2) 
(B3) 
(B4) 
(B5) 



There are 4!=24 'diagonal' contractions; however, not all are unique since the sum is symmetric in (lx,l%), (k,k'), 
(mi, 777,2), (PilO- Representing these pairs by numbered boxes in Fig. [7] we show the five unique combinations that 
will make up the variance. The other three unique terms are: 



A, 




A 3 



A, 




A 5 




FIG. 7: A graphical representation of the 5 combinations of 1 = (h,h), 2 = (k,k'), 3 = (772,2,777.3), 4 = (p,p) which make up 
the variance of B\. 



^3 = (a & a^ a )( ar8 o^)(^(r)^r')><^,(r)^(r')> (B6) 

= C iA 2 ,A 2 C hh 3 ,rn 3 Xk(r,r')S %pX k'{ry)Sp^. (B7) 

Ai= (ar 2 a; a2 )(a r3 ^(r'))<^(r) a t 3 )<^,(r)01,(r)) (B8) 

= ^^^As^O^^sW^Xfc'^r-O^^. (B9) 

A 5 = (a & ^r / )}( a& ^(r))<0g(r)o^><^(r / )o5 is ) (BIO) 

= A 2 (^0% 2 ,A( r ) (5 r 3 , P -^( r )% > m/7n3(r-0^3, fc -. (bii) 

Therefore, in the end we have five unique combinations: 

M = a^^A.W^^^CrO^.-pXfe'^r')^,^, (B12) 

A 2 = A 2 (r)^,.^ (r)^^/3 lB (r')* f8 ^ J 8 ma (B13) 

^3 = a^ r2 ^ 2 G 3 5 r3 ^ Xfc (r,r0^X fc '(r^0%^, (B14) 

^ = Q.^^/^W^^CrO^^^r-')^^, (B15) 

^5 = AsM^/W^W^W^^W' (B16) 
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The last term, As, has the Kronecker deltas <5,- -,Sr - which implies the full term will have the Kronecker delta 
Sf i+ f 2+ f 3 <5jj + p-/ f^fg g,Sf 2 - so that when summing over p and p 1 we will have h = P and Z3 = p so the final term will 
be zero since > 2. Therefore A 5 = and we are left with four unique terms, which agrees with the appendix in 
Ref. [53]. The same approach can be taken with the variance of B\ 

)2 ^ Pk(rWk'{r) P p (r')f3 p ,(r') 
and the covariance between B\ and B\ , 



^ Bl) / C^CV Cp~&p> \ a ^ a h a k a k' \ a *n 2 a*rn 3 a}a}' ) , (B17) 



(a&A^) ~ M ^ r) (a r2 a r3 a^,|< n2 at 3 01(/)^(r')) . (B18) 
Computing these terms shows that certain contractions dominate the sum so that |33j 

(A(B 1 )A(B 1 )) = 8 £ r k r hl r kl r k r 6 hM ( B19 ) 



^ C h C h C kl C k2 C k3 1 ' 

{f},{fc} 



x J r 2 dr(r') 2 dr'a ;i (r)« fel (r')/3 fc2 (r')A 2 W Xi3 (r,r'), 

A(ft)A(ft)) = 8 J2 r h r 2l r kl r k r ( B2 °) 
{i},{k} 

x /" r^r') 2 *'^ (r)a fcl (r') fa (r')fa (0 , 
A(B!)A(^)\ = (A(B 1 )A(B 1 )) . (B21) 



Appendix C: Fast algorithm to compute RNE 

As noted in Rcfs. 26, 28 , due to the inefficiency of harmonic transforms, the MVNH estimator is expensive to 
evaluate, requiring the computation of ~ ^ ax terms. However, a more efficient computational algorithm can be used 
once the MVNH estimator is written in terms real-space quantities- once this is done, the azimuthal part of the 
harmonic transform can be computed using computationally efficient fast-Fourier transforms (FFTs). As is noted in 
Ref. [26] the MVNH estimator can be rewritten as 

fm. = < 2 J r 2 dr J d 2 nB 2 (n,r)A(n,r), (CI) 
A(n,r) = y WQlWftfrim (C2) 



ft. 



B(n,r) = J2 mYl ^ )aim . (C3) 

Ira 

At each location along the line of sight, the resulting estimator only requires the computation of ~ Z^ ax terms. In 
addition to this, the filter functions ai(r) and /3i(r) are sufficiently smooth so that they must only be evaluated for 



0(100) grid points. In Sec. IV we generalized the realization-dependent normalization of Ref. [31] to treat the full 
sky and include the radiation transfer function. Here we show how we may rewrite this estimator in order to utilize 
FFTs to speed up their computation. 
The estimator is given by 

B i-°o ^ 2, 2QQ^C^ B lhl2 B llalb [ m mi m2 )[ m ma mb )- « 4) 



l<h<h,l a h ra\ra 2 m a m h ra 



Using Eq. (J3j) , ( |C2[ ) , and ( C3 1 , we may rewrite Eq. ( C4 ) in a form amenable to rapid computation using filtered 
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real-space maps and FFTs: 



Bl - <7 ^ ^ , (C5) 



ll. 



2Cj 

V; m = J drr 2 ai {r)B { ^{r), B^(r) = J dnY; m (n)B 2 (r,n), (C6) 
E/im = / drr 2 /3 ; (r) / dhY? m (tl)A(r,h)B(r,ri). (C7) 



Written this way, the normalization of the estimator may be computed from a map, and may be efficiently evaluated 
using FFTs, although the computation of V/ TO via numerical integration does introduce a new bottleneck. The 
operation count for these procedures is ~ Imax In (Zmax)^Vint where A^ nt is the number of points sampled along the 

line of sight for the radial integral. The imax In (^max) scaling follows from the computational expense of a harmonic 
transfor m (in which an FFT is used for the azimuthal Fourier transform piece). In contrast, the direct evaluation 
of Eq. (44) would require evaluating and summing ~ i^ ax terms. The computational savings is then a factor of 



^max/( m T^max)^Vint), a huge savings for l max ~ 10 3 , as is the case for the WMAP and Planck missions. In Ref. [26] . 
it is found that to obtain convergence in the bispectrum estimator itself, N- lnt ~ 10 3 is more than adequate. We 
thus estimate that use of the real-space RNE will to a computational savings factor of 10 18 over the harmonic-space 
estimator. 
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